虚拟助手利用自动语音识别(ASR)来帮助用户回答以实体为中心的查询。但是,由于大量经常变化的命名实体,口语实体识别是一个困难的问题。此外,当ASR在设备上执行ASR时,可供识别的资源受到限制。在这项工作中,我们研究了概率语法作为有限状态传感器(FST)框架中的语言模型的使用。我们向概率语法引入了确定性近似,该语法避免了在模型创建时间上的非末端的显式扩展,直接与FST框架集成,并与N-Gram模型互补。与在没有我们的方法的情况下使用类似大小的N-Gram模型相比,我们在长尾部实体查询上获得了10%的相对单词错误率提高。
translated by 谷歌翻译
本文提出了一个统一的框架到(i)找到球,(ii)预测姿势,(iii)在团队体育场景中分段播放器的实例掩码。这些问题对自动体育分析,生产和广播有高兴趣。常见做法是通过利用通用最先进的模型,例如Panoptic-Deeblab来单独解决每个问题,用于玩家分割。除了从单任务模型的乘法乘以增加的复杂性之外,由于团队体育场景的复杂性和特异性,使用现成的架子模型也会阻碍性能,如强大的遮挡和运动模糊。为了规避这些限制,我们的论文提出培训一种单一的模型,它通过组合零件强度场和空间嵌入原理来预测球和玩家掩模和姿势。部件强度场提供球和播放器位置,以及播放器接头位置。然后利用空间嵌入来将播放器实例像素联系到其各自的播放器中心,而且还将播放器接头分组成骷髅。我们展示了拟议模型在DeepSport篮球数据集上的有效性,为单独解决每个单独任务的SOA模型实现了可比性的性能。
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译
Physics-Informed Neural Networks (PINNs) have gained much attention in various fields of engineering thanks to their capability of incorporating physical laws into the models. PINNs integrate the physical constraints by minimizing the partial differential equations (PDEs) residuals on a set of collocation points. The distribution of these collocation points appears to have a huge impact on the performance of PINNs and the assessment of the sampling methods for these points is still an active topic. In this paper, we propose a Fixed-Budget Online Adaptive Mesh Learning (FBOAML) method, which decomposes the domain into sub-domains, for training collocation points based on local maxima and local minima of the PDEs residuals. The stopping criterion is based on a data set of reference, which leads to an adaptive number of iterations for each specific problem. The effectiveness of FBOAML is demonstrated in the context of non-parameterized and parameterized problems. The impact of the hyper-parameters in FBOAML is investigated in this work. The comparison with other adaptive sampling methods is also illustrated. The numerical results demonstrate important gains in terms of accuracy of PINNs with FBOAML over the classical PINNs with non-adaptive collocation points. We also apply FBOAML in a complex industrial application involving coupling between mechanical and thermal fields. We show that FBOAML is able to identify the high-gradient location and even give better prediction for some physical fields than the classical PINNs with collocation points taken on a pre-adapted finite element mesh.
translated by 谷歌翻译
To face the dependency on fossil fuels and limit carbon emissions, fuel cells are a very promising technology and appear to be a key candidate to tackle the increase of the energy demand and promote the energy transition. To meet future needs for both transport and stationary applications, the time to market of fuel cell stacks must be drastically reduced. Here, a new concept to shorten their development time by introducing a disruptive and highefficiency data augmentation approach based on artificial intelligence is presented. Our results allow reducing the testing time before introducing a product on the market from a thousand to a few hours. The innovative concept proposed here can support engineering and research tasks during the fuel cell development process to achieve decreased development costs alongside a reduced time to market.
translated by 谷歌翻译
We study the multiclass classification problem where the features come from the mixture of time-homogeneous diffusions. Specifically, the classes are discriminated by their drift functions while the diffusion coefficient is common to all classes and unknown. In this framework, we build a plug-in classifier which relies on nonparametric estimators of the drift and diffusion functions. We first establish the consistency of our classification procedure under mild assumptions and then provide rates of cnvergence under different set of assumptions. Finally, a numerical study supports our theoretical findings.
translated by 谷歌翻译
We introduce the XPER (eXplainable PERformance) methodology to measure the specific contribution of the input features to the predictive or economic performance of a model. Our methodology offers several advantages. First, it is both model-agnostic and performance metric-agnostic. Second, XPER is theoretically founded as it is based on Shapley values. Third, the interpretation of the benchmark, which is inherent in any Shapley value decomposition, is meaningful in our context. Fourth, XPER is not plagued by model specification error, as it does not require re-estimating the model. Fifth, it can be implemented either at the model level or at the individual level. In an application based on auto loans, we find that performance can be explained by a surprisingly small number of features. XPER decompositions are rather stable across metrics, yet some feature contributions switch sign across metrics. Our analysis also shows that explaining model forecasts and model performance are two distinct tasks.
translated by 谷歌翻译
We propose a novel method for high-quality facial texture reconstruction from RGB images using a novel capturing routine based on a single smartphone which we equip with an inexpensive polarization foil. Specifically, we turn the flashlight into a polarized light source and add a polarization filter on top of the camera. Leveraging this setup, we capture the face of a subject with cross-polarized and parallel-polarized light. For each subject, we record two short sequences in a dark environment under flash illumination with different light polarization using the modified smartphone. Based on these observations, we reconstruct an explicit surface mesh of the face using structure from motion. We then exploit the camera and light co-location within a differentiable renderer to optimize the facial textures using an analysis-by-synthesis approach. Our method optimizes for high-resolution normal textures, diffuse albedo, and specular albedo using a coarse-to-fine optimization scheme. We show that the optimized textures can be used in a standard rendering pipeline to synthesize high-quality photo-realistic 3D digital humans in novel environments.
translated by 谷歌翻译
Turning the weights to zero when training a neural network helps in reducing the computational complexity at inference. To progressively increase the sparsity ratio in the network without causing sharp weight discontinuities during training, our work combines soft-thresholding and straight-through gradient estimation to update the raw, i.e. non-thresholded, version of zeroed weights. Our method, named ST-3 for straight-through/soft-thresholding/sparse-training, obtains SoA results, both in terms of accuracy/sparsity and accuracy/FLOPS trade-offs, when progressively increasing the sparsity ratio in a single training cycle. In particular, despite its simplicity, ST-3 favorably compares to the most recent methods, adopting differentiable formulations or bio-inspired neuroregeneration principles. This suggests that the key ingredients for effective sparsification primarily lie in the ability to give the weights the freedom to evolve smoothly across the zero state while progressively increasing the sparsity ratio. Source code and weights available at https://github.com/vanderschuea/stthree
translated by 谷歌翻译
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
translated by 谷歌翻译